Search CORE

241 research outputs found

Sciunits: Reusable Research Objects

Author: Fils Gabriel
Malik Tanu
That Dai Hai Ton
Yuan Zhihao
Publication venue
Publication date: 11/09/2017
Field of study

Science is conducted collaboratively, often requiring knowledge sharing about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. In this paper, we present the sciunit, a reusable research object in which aggregated content is recomputable. We describe a Git-like client that efficiently creates, stores, and repeats sciunits. We show through analysis that sciunits repeat computational experiments with minimal storage and processing overhead. Finally, we provide an overview of sharing and reproducible cyberinfrastructure based on sciunits gaining adoption in the domain of geosciences

arXiv.org e-Print Archive

Crossref

Utilizing Provenance in Reusable Research Objects

Author: Fils Gabriel
Kothari Siddhant
Malik Tanu
That Dai Hai Ton
Yuan Zhihao
Publication venue: 'MDPI AG'
Publication date: 01/03/2018
Field of study

Science is conducted collaboratively, often requiring the sharing of knowledge about computational experiments. When experiments include only datasets, they can be shared using Uniform Resource Identifiers (URIs) or Digital Object Identifiers (DOIs). An experiment, however, seldom includes only datasets, but more often includes software, its past execution, provenance, and associated documentation. The Research Object has recently emerged as a comprehensive and systematic method for aggregation and identification of diverse elements of computational experiments. While a necessary method, mere aggregation is not sufficient for the sharing of computational experiments. Other users must be able to easily recompute on these shared research objects. Computational provenance is often the key to enable such reuse. In this paper, we show how reusable research objects can utilize provenance to correctly repeat a previous reference execution, to construct a subset of a research object for partial reuse, and to reuse existing contents of a research object for modified reuse. We describe two methods to summarize provenance that aid in understanding the contents and past executions of a research object. The first method obtains a process-view by collapsing low-level system information, and the second method obtains a summary graph by grouping related nodes and edges with the goal to obtain a graph view similar to application workflow. Through detailed experiments, we show the efficacy and efficiency of our algorithms.Comment: 25 page

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data

Author: Gray Jim
Kunszt Peter Z.
Malik Tanu
Raddick Jordan
Stoughton Christopher
Szalay Alexander
Thakar Ani
vandenBerg Jan
Publication venue
Publication date: 07/11/2001
Field of study

The SkyServer provides Internet access to the public Sloan Digital Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and architecture. It also describes our experience operating the SkyServer on the Internet. The SDSS data is public and well-documented so it makes a good test platform for research on database algorithms and performance.Comment: submitted for publication, original at http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2001-10

arXiv.org e-Print Archive

CERN Document Server

Adaptive Physical Design for Curated Archives

Author: Ailamaki Anastasia
Burns Randal
Chaudary Amitabh
Dash Debabrata
Malik Tanu
Wang Xiaodan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

We introduce AdaptPD, an automated physical design tool that improves database performance by continuously monitoring changes in the workload and adapting the physical design to suit the incoming workload. Current physical design tools are offline and require specification of a representative workload. AdaptPD is “always on” and incorporates online algorithms which profile the incoming workload to calculate the relative benefit of transitioning to an alternative design. Efficient query and transition cost estimation modules allow AdaptPD to quickly decide between various design configurations. We evaluate AdaptPD with the SkyServer Astronomy database using queries submitted by SkyServer’s users. Experiments show that AdaptPD adapts to changes in the workload, improves query performance substantially over offline tools, and introduces minor computational overhead

Infoscience - École polytechnique fédérale de Lausanne

Crossref

The Second Data Release of the Sloan Digital Sky Survey

Author: Abazajian Kevork
Adelman-McCarthy Jennifer K.
Agüeros Marcel Andre
Allam Sahar S.
Anderson Kurt S. J.
Anderson Scott F.
Annis James
Bahcall Neta A.
Baldry Ivan K.
Bastian Steven
Berlind Andreas
Bernardi Mariangela
Blanton Michael R.
Bochanski Jr., John J.
Boroski William N.
Briggs John W.
Brinkmann J.
Brunner Robert J.
Budavári Tamás
Carey Larry N.
Carliles Samuel
Castander Francisco J.
Connolly A. J.
Csabai István
Doi Mamoru
Dong Feng
Eisenstein Daniel J.
Evans Michael L.
Fan Xiaohui
Finkbeiner Douglas P.
Friedman Scott D.
Frieman Joshua A.
Fukugita Masataka
Gal Roy R.
Gillespie Bruce
Glazebrook Karl
Gray Jim
Grebel Eva K.
Gunn James E.
Gurbani Vijay K.
Hall Patrick B.
Hamabe Masaru
Harris Frederick H.
Harris Hugh C.
Harvanek Michael
Heckman Timothy M.
Hendry John S.
Hennessy Gregory S.
Hindsley Robert B.
Hogan Craig J.
Hogg David W.
Holmgren Donald J.
Ichikawa Shin-ichi
Ichikawa Takashi
Ivezić Željko
Jester Sebastian
Johnston David E.
Jorgensen Anders M.
Kent Stephen M.
Kleinman S. J.
Knapp G. R.
Kniazev Alexei Yu.
Kron Richard G.
Krzesiński Jurek
Kunszt Peter Z.
Kuropatkin Nickolai
Lamb Donald Q.
Lampeitl Hubert
Lee Brian C.
Leger R. French
Li Nolan
Lin Huan
Loh Yeong-Shang
Long Daniel C.
Loveday Jon
Lupton Robert H.
Malik Tanu
Margon Bruce
Matsubara Takahiko
McGehee Peregrine M.
McKay Timothy A.
Meiksin Avery
Munn Jeffrey A.
Nakajima Reiko
Nash Thomas
Neilsen Jr., Eric H.
Newberg Heidi Jo
Newman Peter R.
Nichol Robert C.
Nicinski Tom
Nieto-Santisteban Maria
Nitta Atsuko
O'Mullane William
Okamura Sadanori
Ostriker Jeremiah P.
Owen Russell
Padmanabhan Nikhil
Peoples John
Pier Jeffrey R.
Pope Adrian C.
Quinn Thomas R.
Richards Gordon T.
Richmond Michael W.
Rix Hans-Walter
Rockosi Constance M.
Schlegel David J.
Schneider Donald P.
Scranton Ryan
Sekiguchi Maki
Seljak Uroš
Sergey Gary
Sesar Branimir
Sheldon Erin
Shimasaku Kazu
Siegmund Walter A.
Silvestri Nicole M.
Sirko Edwin
Smith J. Allyn
Smolčić Vernesa
Snedden Stephanie A.
Stebbins Albert
Stoughton Chris
Strauss Michael A.
SubbaRao Mark
Szalay Alexander S.
Szapudi István
Szkody Paula
Szokoly Gyula P.
Tegmark Max
Teodoro Luis
Thaka Aniruddha R.
Tremonti Christy
Tucker Douglas L.
Uomoto Alan
Vanden Berk Daniel E.
Vandenberg Jan
Vogeley Michael S.
Voges Wolfgang
Vogt Nicole P.
Walkowicz Lucianne M.
Wang Shu-i
Weinberg David H.
West Andrew A.
White Simon D. M.
Wilhite Brian C.
Xu Yongzhong
Yanny Brian
Yasuda Naoki
Yip Ching-Wa
Yocum D. R.
York Donald G.
Zehavi Idit
Zibetti Stefano
Zucker Daniel B.
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2004
Field of study

The Sloan Digital Sky Survey (SDSS) has validated and made publicly available its Second Data Release. This data release consists of 3324 deg2 of five-band (ugriz) imaging data with photometry for over 88 million unique objects, 367,360 spectra of galaxies, quasars, stars, and calibrating blank sky patches selected over 2627 deg2 of this area, and tables of measured parameters from these data. The imaging data reach a depth of r ≈ 22.2 (95% completeness limit for point sources) and are photometrically and astrometrically calibrated to 2% rms and 100 mas rms per coordinate, respectively. The imaging data have all been processed through a new version of the SDSS imaging pipeline, in which the most important improvement since the last data release is fixing an error in the model fits to each object. The result is that model magnitudes are now a good proxy for point-spread function magnitudes for point sources, and Petrosian magnitudes for extended sources. The spectroscopy extends from 3800 to 9200 Å at a resolution of 2000. The spectroscopic software now repairs a systematic error in the radial velocities of certain types of stars and has substantially improved spectrophotometry. All data included in the SDSS Early Data Release and First Data Release are reprocessed with the improved pipelines and included in the Second Data Release. Further characteristics of the data are described, as are the data products themselves and the tools for accessing them

Repository of Faculty of Science, University of Zagreb

Columbia University Academic Commons

University of Zagreb Repository

Macquarie University ResearchOnline

CERN Document Server

The First Data Release of the Sloan Digital Sky Survey

Author: A.Miknaitis Gajus
Abazajian Kevork
Adelman-McCarthy Jennifer K.
Agüeros Marcel A.
Allam Sahar S.
Anderson Scott F.
AndreasBerlind
Annis James
Bahcall Neta A.
Baldry Ivan K.
Bastian Steven
Bernardi Mariangela
Blanton Michael R.
Blythe Norman
Bochanski John J.
Boroski William N.
Brewington Howard
Briggs John W.
Brinkmann J.
BruceMargon
Brunner Robert J.
Budavari Tamas
C.Harris Hugh
Carey Larry N.
Carr Michael A.
Castander Francisco J.
Chiu Kuenley
Collinge Matthew J.
Connolly A. J.
Covey Kevin R.
Csabai István
Dodelson Scott
Doi Mamoru
Dong Feng
Eisenstein Daniel J.
F.Gonzalez Carlos
Fan Xiaohui
Feldman Paul D.
Finkbeiner Douglas P.
Friedman Scott D.
Frieman JoshuaA.
Fukugita Masataka
Gal Roy R.
Gillespie Bruce
Glazebrook Karl
Gray Jim
Grebel Eva K.
Grodnicki Lauren
Gunn James E.
Hall Patrick B.
Hao Lei
Harbeck Daniel
Harris Frederick H.
Harvanek Michael
Hawley Suzanne L.
Heckman Timothy M.
Helmboldt J. F.
Hendry John S.
Hennessy Gregory S.
Hindsley Robert B.
Hogg David W.
Holtzman Jon A.
Homer Lee
Hui Lam
Ichikawa Shin-ichi
Ichikawa Takashi
Inkmann John P.
J.Dalcanton Julianne
J.Holmgren Donald
JeanYarger
Jester Sebastian
Johnston David E.
Jordan Beatrice
Jordan Wendell P.
Jorgensen Anders M.
JurekKrzesinski
Juríc Mario
K.Gurbani Vijay
Kauffmann Guinevere
Kleinman S. J.
Knapp G. R.
Kniazev Alexei Yu.
Kron Richard G.
Kunszt Peter Z.
Kuropatkin Nickolai
L.Evans Michael
L.Zakamska Nadia
Lamb Donald Q.
Lampeitl Hubert
Laubscher Bryan E.
Lee Brian C.
Leger R. French
Li Nolan
Lidz Adam
Lin Huan
Loh Yeong-Shang
Long Daniel C.
Loveday Jon
Lupton Robert H.
M.Kent Stephen
Malik Tanu
McGehee Peregrine M.
McKay Timothy A.
Meiksin Avery
MichaelOdenkirchen
Moorthy Bhasker K.
Munn Jeffrey A.
Murphy Tara
Nakajima Reiko
Narayanan VijayK.
Nash Thomas
Neilsen Eric H. Jr.
Newberg Heidi Jo
Newman Peter R.
Nichol Robert C.
Nicinski Tom
Nieto-Santisteban Maria
NikhilPadmanabhan
Nitta Atsuko
Okamura Sadanori
Ostriker Jeremiah P.
Owen Russell
P.Schneider Donald
Peoples John
Pier Jeffrey R.
Pindor Bartosz
Pope Adrian C.
R.Quinn Thomas
Rafikov R. R.
Raymond Sean N.
Richards Gordon T.
Richmond Michael W.
Rix Hans-Walter
Rockosi Constance M.
Schaye Joop
Schlegel David J.
Schroeder Joshua
Scranton Ryan
Sekiguchi Maki
Seljak Uros
Sergey Gary
Sesar Branimir
Sheldon Erin
Shimasaku Kazu
Siegmund Walter A.
Silvestri Nicole M.
Sinisgalli Allan J.
Sirko Edwin
Smith J. Allyn
Smolčíc Vernesa
Snedden Stephanie A.
Stebbins Albert
Steinhardt Charles
Stinson Gregory
Stoughton Chris
Strateva Iskra V.
Strauss Michael A.
SubbaRao Mark
Szalay Alexander S.
Szapudi István
Szkody Paula
Tasca Lidia
Tegmark Max
Thakar Aniruddha R.
Tremonti Christy
Tucker Douglas L.
Uomoto Alan
Vanden Berk Daniel E.
Vandenberg Jan
Vogeley Michael S.
Vogt Nicole P.
Walkowicz Lucianne M.
Weinberg David H.
West Andrew A.
White Simon D.M.
Wilhite Brian C.
Willman Beth
WolfgangVoges
Xu Yongzhong
Yanny Brian
Yasuda Naoki
Yip Ching-Wa
Yocum D. R.
York Donald G.
Zehavi Idit
Zheng Wei
Zibetti Stefano
Zucker Daniel B.
ˇ Zeljko Ivezíc
Publication venue: 'University of Chicago Press'
Publication date: 01/01/2003
Field of study

The Sloan Digital Sky Survey has validated and made publicly available its First Data Release. This consists of 2099 square degrees of five-band (u, g, r, i, z) imaging data, 186,240 spectra of galaxies, quasars, stars and calibrating blank sky patches selected over 1360 square degrees of this area, and tables of measured parameters from these data. The imaging data go to a depth of r ~ 22.6 and are photometrically and astrometrically calibrated to 2% rms and 100 milli-arcsec rms per coordinate, respectively. The spectra cover the range 3800--9200 A, with a resolution of 1800--2100. Further characteristics of the data are described, as are the data products themselves.Comment: Submitted to The Astronomical Journal. 16 pages. For associated documentation, see http://www.sdss.org/dr

arXiv.org e-Print Archive

Crossref

Repository of Faculty of Science, University of Zagreb

Columbia University Academic Commons

University of Zagreb Repository

CERN Document Server

Estimating query result sizes for proxy caching in scientific database federations

Author: Tanu Malik
Publication venue
Publication date: 01/01/2006
Field of study

In a proxy cache for federations of scientific databases it is important to estimate the size of a query before making a caching decision. With accurate estimates, near-optimal cache performance can be obtained. On the other extreme, inaccurate estimates can render the cache totally ineffective. We present classification and regression over templates (CAROT), a general method for estimating query result sizes, which is suited to the resource-limited environment of proxy caches and the distributed nature of database federations. CAROT estimates query result sizes by learning the distribution of query results, not by examining or sampling data, but from observing workload. We have integrated CAROT into the proxy cache of the National Virtual Observatory (NVO) federation of astronomy databases. Experiments conducted in the NVO show that CAROT dramatically outperforms conventional estimation techniques and provides near-optimal cache performance. 1

CiteSeerX